15 research outputs found
Application and Development of Computational Methods for Ligand-Based Virtual Screening
The detection of novel active compounds that are able to modulate the biological function of a target is the primary goal of drug discovery. Different screening methods are available to identify hit compounds having the desired bioactivity in a large collection of molecules. As a computational method, virtual screening (VS) is used to search compound libraries in silico and identify those compounds that are likely to exhibit a specific activity. Ligand-based virtual screening (LBVS) is a subdiscipline that uses the information of one or more known active compounds in order to identify new hit compounds. Different LBVS methods exist, e.g. similarity searching and support vector machines (SVMs). In order to enable the application of these computational approaches, compounds have to be described numerically. Fingerprints derived from the two-dimensional compound structure, called 2D fingerprints, are among the most popular molecular descriptors available. This thesis covers the usage of 2D fingerprints in the context of LBVS. The first part focuses on a detailed analysis of 2D fingerprints. Their performance range against a wide range of pharmaceutical targets is globally estimated through fingerprint-based similarity searching. Additionally, mechanisms by which fingerprints are capable of detecting structurally diverse active compounds are identified. For this purpose, two different feature selection methods are applied to find those fingerprint features that are most relevant for the active compounds and distinguish them from other compounds. Then, 2D fingerprints are used in SVM calculations. The SVM methodology provides several opportunities to include additional information about the compounds in order to direct LBVS search calculations. In a first step, a variant of the SVM approach is applied to the multi-class prediction problem involving compounds that are active against several related targets. SVM linear combination is used to recover compounds with desired activity profiles and deprioritize compounds with other activities. Then, the SVM methodology is adopted for potency-directed VS. Compound potency is incorporated into the SVM approach through potencyoriented SVM linear combination and kernel function design to direct search calculations to the preferential detection of potent hit compounds. Next, SVM calculations are applied to address an intrinsic limitation of similarity-based methods, i.e., the presence of similar compounds having large differences in their potency. An especially designed SVM approach is introduced to predict compound pairs forming such activity cliffs. Finally, the impact of different training sets on the recall performance of SVM-based VS is analyzed and caveats are identified
Comparison of Confirmed Inactive and Randomly Selected Compounds as Negative Training Examples in Support Vector Machine-Based Virtual Screening
The choice of negative training data
for machine learning is a
little explored issue in chemoinformatics. In this study, the influence
of alternative sets of negative training data and different background
databases on support vector machine (SVM) modeling and virtual screening
has been investigated. Target-directed SVM models have been derived
on the basis of differently composed training sets containing confirmed
inactive molecules or randomly selected database compounds as negative
training instances. These models were then applied to search background
databases consisting of biological screening data or randomly assembled
compounds for available hits. Negative training data were found to
systematically influence compound recall in virtual screening. In
addition, different background databases had a strong influence on
the search results. Our findings also indicated that typical benchmark
settings lead to an overestimation of SVM-based virtual screening
performance compared to search conditions that are more relevant for
practical applications
Prediction of Compounds in Different Local SAR Environments using ECP
SD files of 15 data sets reported in the manuscript are uploaded. Each data set is represented by its CHEMBL Target ID. The file format is provided in the file 'description.txt'
Compound Pathway Model To Capture SAR Progression: Comparison of Activity Cliff-Dependent and -Independent Pathways
A compound pathway model is introduced
to monitor SAR progression
in compound data sets. Pathways are formed by sequences of structurally
analogous compounds with stepwise increasing potency that ultimately
yield highly potent compounds. Hence, the model was designed to mimic
compound optimization efforts. Different pathway categories were defined.
Pathways originating from any active compound in a data set were systematically
identified including compounds forming activity cliffs. The relative
frequency of activity cliff-dependent and -independent pathways was
determined and compared. In 23 of 39 different compound data sets
that qualified for our analysis, significant differences in the relative
frequency of activity cliff-dependent and -independent pathways were
observed. In 17 of these 23 data sets, activity cliff-dependent pathways
occurred with higher relative frequency than cliff-independent pathways.
In addition, pathways originating from the majority of activity cliff
compounds displayed desired SAR progression, reflecting SAR information
gain associated with activity cliffs
Computational polypharmacology analysis of the heat shock protein 90 interactome
The design of a single drug molecule that is able to simultaneously and specifically interact with multiple biological targets is gaining major consideration in drug discovery. However, the rational design of drugs with a desired polypharmacology profile is still a challenging task, especially when these targets are distantly related or unrelated. In this work, we present a computational approach aimed at the identification of suitable target combinations for multitarget drug design within an ensemble of biologically relevant proteins. The target selection relies on the analysis of activity annotations present in molecular databases and on ligand-based virtual screening. A few target combinations were also inspected with structure-based methods to demonstrate that the identified dual-activity compounds are able to bind target combinations characterized by remote binding site similarities. Our approach was applied to the heat shock protein 90 (Hsp90) interactome, which contains several targets of key importance in cancer. Promising target combinations were identified, providing a basis for the computational design of compounds with dual activity. The approach may be used on any ensemble of proteins of interest for which known inhibitors are available
Prediction of Compounds in Different Local Structure–Activity Relationship Environments Using Emerging Chemical Patterns
Active compounds can participate
in different local structure–activity
relationship (SAR) environments and introduce different degrees of
local SAR discontinuity, depending on their structural and potency
relationships in data sets. Such SAR features have thus far mostly
been analyzed using descriptive approaches, in particular, on the
basis of activity landscape modeling. However, compounds in different
local SAR environments have not yet been predicted. Herein, we adapt
the emerging chemical patterns (ECP) method, a machine learning approach
for compound classification, to systematically predict compounds with
different local SAR characteristics. ECP analysis is shown to accurately
assign many compounds to different local SAR environments across a
variety of activity classes covering the entire range of observed
local SARs. Control calculations using random forests and multiclass
support vector machines were carried out and a variety of statistical
performance measures were applied. In all instances, ECP calculations
yielded comparable or better performance than controls. The approach
presented herein can be applied to predict compounds that complement
local SARs or prioritize compounds with different SAR characteristics